Research Summary
1. Introduction

Rising concerns about urban air quality and its impact on public health have prompted researchers to investigate the intricate links between environmental factors and health outcomes. This report provides a thorough analysis of how weather conditions relate to health risk scores in different cities during September. To achieve this, we began by developing several SMART questions focused on understanding how factors like temperature, humidity, and wind speed affect health risk scores. To reach our goals, we performed a detailed exploratory data analysis (EDA) to uncover patterns and trends in the dataset. We also used appropriate statistical tests to validate our findings and draw meaningful conclusions. This report aims to clarify the connections between environmental conditions and health.

2. Dataset Summary

2.1 Dataset Overview

The dataset comprises a collection of 27,674 observations related to meteorological and environmental data, including temperature, humidity, and wind speed, recorded across across seven major U.S. cities for the month of September. It contains a total of 43 variables, such as Health Risk Score and Severity Score, providing a detailed view of daily atmospheric conditions.Key variables such as temperature, humidity, wind speed, visibility, and weather descriptions enable an in-depth analysis of how urban air quality affects health.

Data Loading

##     datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024    1725692400    89.0    62.1 73.3         88.6         62.1
## 2 08-09-2024    1725778800    89.0    60.0 72.4         87.9         60.0
## 3 10-09-2024    1725951600    79.4    59.6 67.8         79.4         59.6
## 4 11-09-2024    1726038000    77.3    57.6 66.3         77.3         57.6
## 5 12-09-2024    1726124400    79.2    57.8 67.4         79.2         57.8
## 6 13-09-2024    1726210800    83.2    58.9 69.6         82.2         58.9
##   feelslike  dew humidity precip precipprob precipcover snow snowdepth windgust
## 1      73.3 59.8     66.3      0          0           0    0         0     16.1
## 2      72.3 57.6     62.5      0          0           0    0         0     13.9
## 3      67.8 57.2     70.7      0          0           0    0         0     17.4
## 4      66.3 56.8     73.1      0          4           0    0         0     23.0
## 5      67.4 55.6     68.3      0          5           0    0         0     17.9
## 6      69.5 54.2     60.5      0          0           0    0         0     16.1
##   windspeed winddir pressure cloudcover visibility solarradiation solarenergy
## 1       9.2   311.1   1012.2       12.0       10.0          267.7        23.4
## 2       8.1   310.2   1012.1       15.6        9.8          279.0        24.1
## 3       9.8   290.2   1012.5       18.8       12.4          274.7        23.8
## 4      13.4   273.9   1009.6       17.3       15.0          264.0        22.6
## 5      10.7   285.8   1007.0       14.2       15.0          262.2        22.6
## 6       8.9   287.5   1007.4        5.9       15.0          263.2        22.5
##   uvindex severerisk  sunrise sunriseEpoch   sunset sunsetEpoch moonphase
## 1       9         10 06:43:31   1725716611 19:26:34  1725762394      0.16
## 2       9         10 06:44:20   1725803060 19:25:03  1725848703      0.19
## 3       9         10 06:45:59   1725975959 19:22:01  1726021321      0.25
## 4       8         10 06:46:48   1726062408 19:20:29  1726107629      0.29
## 5       8         10 06:47:38   1726148858 19:18:57  1726193937      0.32
## 6       8         10 06:48:27   1726235307 19:17:25  1726280245      0.36
##   conditions                          description      icon source     City
## 1      Clear Clear conditions throughout the day. clear-day   comb San Jose
## 2      Clear Clear conditions throughout the day. clear-day   fcst San Jose
## 3      Clear Clear conditions throughout the day. clear-day   fcst San Jose
## 4      Clear Clear conditions throughout the day. clear-day   fcst San Jose
## 5      Clear Clear conditions throughout the day. clear-day   fcst San Jose
## 6      Clear Clear conditions throughout the day. clear-day   fcst San Jose
##   Temp_Range Heat_Index Severity_Score Month Season Day_of_Week Is_Weekend
## 1       26.9    75.8425           3.41     9   Fall    Saturday       True
## 2       29.0    75.9270           3.19     9   Fall      Sunday       True
## 3       19.8    73.5164           3.54     9   Fall     Tuesday      False
## 4       19.7    72.9060           3.90     9   Fall   Wednesday      False
## 5       21.4    74.3009           3.39     9   Fall    Thursday      False
## 6       24.3    75.8192           3.21     9   Fall      Friday      False
##   Health_Risk_Score
## 1           9.84508
## 2           9.58645
## 3           9.85442
## 4          10.14150
## 5           9.74546
## 6           9.52397

Data Description and Summmary

## [1] "Row Count: 27674 Column Count: 43"
## 'data.frame':    27674 obs. of  43 variables:
##  $ datetime         : chr  "07-09-2024" "08-09-2024" "10-09-2024" "11-09-2024" ...
##  $ datetimeEpoch    : num  1725692400 1725778800 1725951600 1726038000 1726124400 ...
##  $ tempmax          : num  89 89 79.4 77.3 79.2 83.2 81.4 78.3 81.2 82.3 ...
##  $ tempmin          : num  62.1 60 59.6 57.6 57.8 58.9 59.4 59.8 59.3 60.9 ...
##  $ temp             : num  73.3 72.4 67.8 66.3 67.4 69.6 68.8 66.8 68.9 68.5 ...
##  $ feelslikemax     : num  88.6 87.9 79.4 77.3 79.2 82.2 81.3 78.3 79.6 80.2 ...
##  $ feelslikemin     : num  62.1 60 59.6 57.6 57.8 58.9 59.4 59.8 59.3 60.9 ...
##  $ feelslike        : num  73.3 72.3 67.8 66.3 67.4 69.5 68.8 66.8 68.6 68.4 ...
##  $ dew              : num  59.8 57.6 57.2 56.8 55.6 54.2 55.5 47.3 44.4 46.6 ...
##  $ humidity         : num  66.3 62.5 70.7 73.1 68.3 60.5 64.2 52.9 43.5 47.6 ...
##  $ precip           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ precipprob       : num  0 0 0 4 5 0 1 3.2 0 0 ...
##  $ precipcover      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ snow             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ snowdepth        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ windgust         : num  16.1 13.9 17.4 23 17.9 16.1 16.6 9.8 7.8 8.5 ...
##  $ windspeed        : num  9.2 8.1 9.8 13.4 10.7 8.9 9.8 8.9 8.1 10.3 ...
##  $ winddir          : num  311 310 290 274 286 ...
##  $ pressure         : num  1012 1012 1012 1010 1007 ...
##  $ cloudcover       : num  12 15.6 18.8 17.3 14.2 5.9 8.9 3.1 8.6 14.2 ...
##  $ visibility       : num  10 9.8 12.4 15 15 15 14.9 14.9 15 14.9 ...
##  $ solarradiation   : num  268 279 275 264 262 ...
##  $ solarenergy      : num  23.4 24.1 23.8 22.6 22.6 22.5 22.3 22 21.7 21.3 ...
##  $ uvindex          : num  9 9 9 8 8 8 8 8 8 8 ...
##  $ severerisk       : num  10 10 10 10 10 10 10 10 10 10 ...
##  $ sunrise          : chr  "06:43:31" "06:44:20" "06:45:59" "06:46:48" ...
##  $ sunriseEpoch     : num  1725716611 1725803060 1725975959 1726062408 1726148858 ...
##  $ sunset           : chr  "19:26:34" "19:25:03" "19:22:01" "19:20:29" ...
##  $ sunsetEpoch      : num  1725762394 1725848703 1726021321 1726107629 1726193937 ...
##  $ moonphase        : num  0.16 0.19 0.25 0.29 0.32 0.36 0.39 0.42 0.46 0.5 ...
##  $ conditions       : chr  "Clear" "Clear" "Clear" "Clear" ...
##  $ description      : chr  "Clear conditions throughout the day." "Clear conditions throughout the day." "Clear conditions throughout the day." "Clear conditions throughout the day." ...
##  $ icon             : chr  "clear-day" "clear-day" "clear-day" "clear-day" ...
##  $ source           : chr  "comb" "fcst" "fcst" "fcst" ...
##  $ City             : chr  "San Jose" "San Jose" "San Jose" "San Jose" ...
##  $ Temp_Range       : num  26.9 29 19.8 19.7 21.4 24.3 22 18.5 21.9 21.4 ...
##  $ Heat_Index       : num  75.8 75.9 73.5 72.9 74.3 ...
##  $ Severity_Score   : num  3.41 3.19 3.54 3.9 3.39 3.21 3.26 2.58 2.38 2.45 ...
##  $ Month            : int  9 9 9 9 9 9 9 9 9 9 ...
##  $ Season           : chr  "Fall" "Fall" "Fall" "Fall" ...
##  $ Day_of_Week      : chr  "Saturday" "Sunday" "Tuesday" "Wednesday" ...
##  $ Is_Weekend       : chr  "True" "True" "False" "False" ...
##  $ Health_Risk_Score: num  9.85 9.59 9.85 10.14 9.75 ...

2.2 Data Source

The dataset is presumed to originate from a weather forecasting or environmental aggregation service and is sourced from the Kaggle website.

2.3 Scope and SMART Questions

Key variables such as temperature, humidity, wind speed, visibility, and weather descriptions facilitate a comprehensive analysis of the effects of urban air quality on health. To guide our investigation, we have formulated several SMART questions:
1.How health risk scores differ between weekdays and weekends?
2.Analyzing how key meteorological factors vary across different cities.
3.Investigating the key meteorological factors that significantly influence health risk scores?
4.How changes in humidity might affect overall health?
5.Examining how wind speed and health risk scores vary over days.

These questions will help us effectively explore the connections between environmental conditions and health outcomes.

3. Limitations of the Dataset

The dataset has temporal and spatial limitations, as it only contains data from September and a limited number of cities. This may not adequately represent the weather patterns over a broader range of years or the associated health risks. Furthermore, it lacks several important variables that could clarify how health risks were calculated, potentially overlooking significant factors that impact health.

4. Exploratory Data Analysis

4.1 Appropriate Datatype Conversion

Out of 43 variables, 9 variables are factor, 30 are numeric, and 3 are character.

##          datetime     datetimeEpoch           tempmax           tempmin 
##       "character"         "numeric"         "numeric"         "numeric" 
##              temp      feelslikemax      feelslikemin         feelslike 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##               dew          humidity            precip        precipprob 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##       precipcover              snow         snowdepth          windgust 
##         "numeric"         "integer"         "numeric"         "numeric" 
##         windspeed           winddir          pressure        cloudcover 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##        visibility    solarradiation       solarenergy           uvindex 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##        severerisk           sunrise      sunriseEpoch            sunset 
##         "numeric"       "character"         "numeric"       "character" 
##       sunsetEpoch         moonphase        conditions       description 
##         "numeric"         "numeric"          "factor"          "factor" 
##              icon            source              City        Temp_Range 
##          "factor"          "factor"          "factor"         "numeric" 
##        Heat_Index    Severity_Score             Month            Season 
##         "numeric"         "numeric"          "factor"          "factor" 
##       Day_of_Week        Is_Weekend Health_Risk_Score 
##          "factor"          "factor"         "numeric"
##    datetime         datetimeEpoch           tempmax        tempmin    
##  Length:27674       Min.   :1725642512   Min.   :70.2   Min.   :50.5  
##  Class :character   1st Qu.:1725900770   1st Qu.:77.3   1st Qu.:58.4  
##  Mode  :character   Median :1726142273   Median :81.4   Median :61.2  
##                     Mean   :1726191240   Mean   :80.6   Mean   :61.4  
##                     3rd Qu.:1726440108   3rd Qu.:83.2   3rd Qu.:64.6  
##                     Max.   :1726932525   Max.   :90.5   Max.   :74.9  
##                                                                       
##       temp       feelslikemax   feelslikemin    feelslike         dew      
##  Min.   :60.1   Min.   :68.8   Min.   :50.2   Min.   :59.3   Min.   :41.1  
##  1st Qu.:66.5   1st Qu.:77.3   1st Qu.:58.0   1st Qu.:66.6   1st Qu.:47.5  
##  Median :70.5   Median :81.0   Median :61.0   Median :70.4   Median :53.8  
##  Mean   :69.9   Mean   :80.2   Mean   :61.4   Mean   :70.0   Mean   :53.1  
##  3rd Qu.:73.4   3rd Qu.:83.1   3rd Qu.:64.7   3rd Qu.:73.3   3rd Qu.:58.5  
##  Max.   :79.7   Max.   :90.3   Max.   :76.5   Max.   :80.1   Max.   :65.9  
##                                                                            
##     humidity        precip            precipprob      precipcover     
##  Min.   :38.5   Min.   :-0.020883   Min.   :-5.902   Min.   :-1.4813  
##  1st Qu.:51.9   1st Qu.:-0.002877   1st Qu.:-0.286   1st Qu.:-0.2819  
##  Median :57.5   Median : 0.000000   Median : 1.000   Median : 0.0000  
##  Mean   :57.3   Mean   : 0.000908   Mean   : 2.355   Mean   : 0.0497  
##  3rd Qu.:63.0   3rd Qu.: 0.005045   3rd Qu.: 3.282   3rd Qu.: 0.3923  
##  Max.   :76.6   Max.   : 0.024597   Max.   :20.811   Max.   : 2.2277  
##                                                                       
##       snow     snowdepth    windgust       windspeed        winddir     
##  Min.   :0   Min.   :0   Min.   : 3.50   Min.   : 4.89   Min.   : 21.5  
##  1st Qu.:0   1st Qu.:0   1st Qu.: 9.21   1st Qu.: 8.25   1st Qu.:164.0  
##  Median :0   Median :0   Median :13.69   Median : 9.20   Median :200.9  
##  Mean   :0   Mean   :0   Mean   :13.11   Mean   : 9.33   Mean   :207.5  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:16.28   3rd Qu.:10.43   3rd Qu.:279.4  
##  Max.   :0   Max.   :0   Max.   :23.42   Max.   :15.00   Max.   :330.2  
##                                                                         
##     pressure      cloudcover      visibility    solarradiation  solarenergy  
##  Min.   :1006   Min.   :-4.40   Min.   : 9.53   Min.   :214    Min.   :18.7  
##  1st Qu.:1012   1st Qu.: 3.43   1st Qu.:11.93   1st Qu.:250    1st Qu.:21.6  
##  Median :1016   Median :10.26   Median :14.90   Median :259    Median :22.3  
##  Mean   :1016   Mean   :11.03   Mean   :13.63   Mean   :260    Mean   :22.5  
##  3rd Qu.:1021   3rd Qu.:16.63   3rd Qu.:15.05   3rd Qu.:268    3rd Qu.:23.3  
##  Max.   :1031   Max.   :29.63   Max.   :15.71   Max.   :313    Max.   :26.8  
##                                                                              
##     uvindex        severerisk      sunrise           sunriseEpoch       
##  Min.   : 5.72   Min.   : 8.21   Length:27674       Min.   :1725673373  
##  1st Qu.: 6.89   1st Qu.: 9.58   Class :character   1st Qu.:1725957500  
##  Median : 7.98   Median :10.00   Mode  :character   Median :1726170046  
##  Mean   : 7.68   Mean   :10.06                      Mean   :1726215696  
##  3rd Qu.: 8.37   3rd Qu.:10.63                      3rd Qu.:1726451108  
##  Max.   :10.00   Max.   :12.06                      Max.   :1726990562  
##                                                                         
##     sunset           sunsetEpoch           moonphase    
##  Length:27674       Min.   :1725711455   Min.   :0.141  
##  Class :character   1st Qu.:1725975203   1st Qu.:0.231  
##  Mode  :character   Median :1726209348   Median :0.325  
##                     Mean   :1726260189   Mean   :0.346  
##                     3rd Qu.:1726496196   3rd Qu.:0.450  
##                     Max.   :1727027186   Max.   :0.647  
##                                                         
##             conditions                                  description   
##  Clear           :22826   Becoming cloudy in the afternoon.   :  404  
##  Partially cloudy: 4848   Clear conditions throughout the day.:22422  
##                           Clearing in the afternoon.          :  404  
##                           Partly cloudy throughout the day.   : 4444  
##                                                                       
##                                                                       
##                                                                       
##                 icon        source                 City        Temp_Range  
##  clear-day        :22826   comb:  505   San Jose     :7777   Min.   : 8.1  
##  partly-cloudy-day: 4848   fcst:27169   New York City:6060   1st Qu.:16.9  
##                                         Philadelphia :3939   Median :19.8  
##                                         Chicago      :3838   Mean   :19.3  
##                                         Los Angeles  :3838   3rd Qu.:21.7  
##                                         Dallas       : 909   Max.   :29.8  
##                                         (Other)      :1313                 
##    Heat_Index   Severity_Score Month      Season         Day_of_Week  
##  Min.   :72.5   Min.   :1.85   9:27674   Fall:27674   Friday   :3737  
##  1st Qu.:76.1   1st Qu.:2.44                          Monday   :4343  
##  Median :77.0   Median :2.72                          Saturday :3333  
##  Mean   :77.1   Mean   :2.85                          Sunday   :5858  
##  3rd Qu.:78.0   3rd Qu.:3.23                          Thursday :3535  
##  Max.   :81.6   Max.   :4.32                          Tuesday  :2727  
##                                                       Wednesday:4141  
##  Is_Weekend    Health_Risk_Score
##  False:18483   Min.   : 8.41    
##  True : 9191   1st Qu.: 9.06    
##                Median : 9.28    
##                Mean   : 9.34    
##                3rd Qu.: 9.59    
##                Max.   :10.70    
## 

4.2 Dropping unwanted columns

Dropping season , snow, snowdepth and month columns as they have single value.

##          datetime     datetimeEpoch           tempmax           tempmin 
##                15               262               268               264 
##              temp      feelslikemax      feelslikemin         feelslike 
##               270               270               264               271 
##               dew          humidity            precip        precipprob 
##               272               269               234               243 
##       precipcover              snow         snowdepth          windgust 
##               234                 1                 1               261 
##         windspeed           winddir          pressure        cloudcover 
##               252               273               271               271 
##        visibility    solarradiation       solarenergy           uvindex 
##               242               274               260               238 
##        severerisk           sunrise      sunriseEpoch            sunset 
##               234                45               274                44 
##       sunsetEpoch         moonphase        conditions       description 
##               274               250                 2                 4 
##              icon            source              City        Temp_Range 
##                 2                 2                 9               269 
##        Heat_Index    Severity_Score             Month            Season 
##               274               268                 1                 1 
##       Day_of_Week        Is_Weekend Health_Risk_Score 
##                 7                 2             27674
##     datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024    1725692400    89.0    62.1 73.3         88.6         62.1
## 2 08-09-2024    1725778800    89.0    60.0 72.4         87.9         60.0
## 3 10-09-2024    1725951600    79.4    59.6 67.8         79.4         59.6
## 4 11-09-2024    1726038000    77.3    57.6 66.3         77.3         57.6
## 5 12-09-2024    1726124400    79.2    57.8 67.4         79.2         57.8
## 6 13-09-2024    1726210800    83.2    58.9 69.6         82.2         58.9
##   feelslike  dew humidity precip precipprob precipcover windgust windspeed
## 1      73.3 59.8     66.3      0          0           0     16.1       9.2
## 2      72.3 57.6     62.5      0          0           0     13.9       8.1
## 3      67.8 57.2     70.7      0          0           0     17.4       9.8
## 4      66.3 56.8     73.1      0          4           0     23.0      13.4
## 5      67.4 55.6     68.3      0          5           0     17.9      10.7
## 6      69.5 54.2     60.5      0          0           0     16.1       8.9
##   winddir pressure cloudcover visibility solarradiation solarenergy uvindex
## 1   311.1   1012.2       12.0       10.0          267.7        23.4       9
## 2   310.2   1012.1       15.6        9.8          279.0        24.1       9
## 3   290.2   1012.5       18.8       12.4          274.7        23.8       9
## 4   273.9   1009.6       17.3       15.0          264.0        22.6       8
## 5   285.8   1007.0       14.2       15.0          262.2        22.6       8
## 6   287.5   1007.4        5.9       15.0          263.2        22.5       8
##   severerisk  sunrise sunriseEpoch   sunset sunsetEpoch moonphase conditions
## 1         10 06:43:31   1725716611 19:26:34  1725762394      0.16      Clear
## 2         10 06:44:20   1725803060 19:25:03  1725848703      0.19      Clear
## 3         10 06:45:59   1725975959 19:22:01  1726021321      0.25      Clear
## 4         10 06:46:48   1726062408 19:20:29  1726107629      0.29      Clear
## 5         10 06:47:38   1726148858 19:18:57  1726193937      0.32      Clear
## 6         10 06:48:27   1726235307 19:17:25  1726280245      0.36      Clear
##                            description      icon source     City Temp_Range
## 1 Clear conditions throughout the day. clear-day   comb San Jose       26.9
## 2 Clear conditions throughout the day. clear-day   fcst San Jose       29.0
## 3 Clear conditions throughout the day. clear-day   fcst San Jose       19.8
## 4 Clear conditions throughout the day. clear-day   fcst San Jose       19.7
## 5 Clear conditions throughout the day. clear-day   fcst San Jose       21.4
## 6 Clear conditions throughout the day. clear-day   fcst San Jose       24.3
##   Heat_Index Severity_Score Day_of_Week Is_Weekend Health_Risk_Score
## 1    75.8425           3.41    Saturday       True           9.84508
## 2    75.9270           3.19      Sunday       True           9.58645
## 3    73.5164           3.54     Tuesday      False           9.85442
## 4    72.9060           3.90   Wednesday      False          10.14150
## 5    74.3009           3.39    Thursday      False           9.74546
## 6    75.8192           3.21      Friday      False           9.52397
## [1] 27674    39

4.3 Duplicates and missing values removal

No missing values and duplicate rows

## [1] "Number of duplicate rows: 0"
##          datetime     datetimeEpoch           tempmax           tempmin 
##                 0                 0                 0                 0 
##              temp      feelslikemax      feelslikemin         feelslike 
##                 0                 0                 0                 0 
##               dew          humidity            precip        precipprob 
##                 0                 0                 0                 0 
##       precipcover          windgust         windspeed           winddir 
##                 0                 0                 0                 0 
##          pressure        cloudcover        visibility    solarradiation 
##                 0                 0                 0                 0 
##       solarenergy           uvindex        severerisk           sunrise 
##                 0                 0                 0                 0 
##      sunriseEpoch            sunset       sunsetEpoch         moonphase 
##                 0                 0                 0                 0 
##        conditions       description              icon            source 
##                 0                 0                 0                 0 
##              City        Temp_Range        Heat_Index    Severity_Score 
##                 0                 0                 0                 0 
##       Day_of_Week        Is_Weekend Health_Risk_Score 
##                 0                 0                 0

4.4 Outliers Removal

After removing the outliers, we are left with 18885 observations

##     datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin
## 1 07-09-2024    1725692400    89.0    62.1 73.3         88.6         62.1
## 3 10-09-2024    1725951600    79.4    59.6 67.8         79.4         59.6
## 5 12-09-2024    1726124400    79.2    57.8 67.4         79.2         57.8
## 6 13-09-2024    1726210800    83.2    58.9 69.6         82.2         58.9
## 7 14-09-2024    1726297200    81.4    59.4 68.8         81.3         59.4
## 8 15-09-2024    1726383600    78.3    59.8 66.8         78.3         59.8
##   feelslike  dew humidity precip precipprob precipcover windgust windspeed
## 1      73.3 59.8     66.3      0        0.0           0     16.1       9.2
## 3      67.8 57.2     70.7      0        0.0           0     17.4       9.8
## 5      67.4 55.6     68.3      0        5.0           0     17.9      10.7
## 6      69.5 54.2     60.5      0        0.0           0     16.1       8.9
## 7      68.8 55.5     64.2      0        1.0           0     16.6       9.8
## 8      66.8 47.3     52.9      0        3.2           0      9.8       8.9
##   winddir pressure cloudcover visibility solarradiation solarenergy uvindex
## 1   311.1   1012.2       12.0       10.0          267.7        23.4       9
## 3   290.2   1012.5       18.8       12.4          274.7        23.8       9
## 5   285.8   1007.0       14.2       15.0          262.2        22.6       8
## 6   287.5   1007.4        5.9       15.0          263.2        22.5       8
## 7   263.1   1010.3        8.9       14.9          259.1        22.3       8
## 8   256.5   1011.5        3.1       14.9          255.1        22.0       8
##   severerisk  sunrise sunriseEpoch   sunset sunsetEpoch moonphase conditions
## 1         10 06:43:31   1725716611 19:26:34  1725762394      0.16      Clear
## 3         10 06:45:59   1725975959 19:22:01  1726021321      0.25      Clear
## 5         10 06:47:38   1726148858 19:18:57  1726193937      0.32      Clear
## 6         10 06:48:27   1726235307 19:17:25  1726280245      0.36      Clear
## 7         10 06:49:17   1726321757 19:15:53  1726366553      0.39      Clear
## 8         10 06:50:06   1726408206 19:14:21  1726452861      0.42      Clear
##                            description      icon source     City Temp_Range
## 1 Clear conditions throughout the day. clear-day   comb San Jose       26.9
## 3 Clear conditions throughout the day. clear-day   fcst San Jose       19.8
## 5 Clear conditions throughout the day. clear-day   fcst San Jose       21.4
## 6 Clear conditions throughout the day. clear-day   fcst San Jose       24.3
## 7 Clear conditions throughout the day. clear-day   fcst San Jose       22.0
## 8 Clear conditions throughout the day. clear-day   fcst San Jose       18.5
##   Heat_Index Severity_Score Day_of_Week Is_Weekend Health_Risk_Score
## 1    75.8425           3.41    Saturday       True           9.84508
## 3    73.5164           3.54     Tuesday      False           9.85442
## 5    74.3009           3.39    Thursday      False           9.74546
## 6    75.8192           3.21      Friday      False           9.52397
## 7    75.1635           3.26    Saturday       True           9.60927
## 8    77.5603           2.58      Sunday       True           9.12085
## [1] "Row Count: 18885 Column Count: 39"

4.5 Univariate Analysis

Most of the population has health risk scores around 9.0, with a small subset showing elevated scores near 10.0.

## [1] "The majority of the population has health risk scores around 9.0, with a small subset showing elevated scores near 10.0."

4.6 Bivariate and Multivariate Analysis

There is a significant difference in Health Risk Scores between weekends and weekdays. Scores are generally higher on weekends than on weekdays, indicating increased health risks during weekends

## [1] "Is there a statistically significant difference in the Health Risk Score on weekends compared to weekdays?"
## Hypothesis Statements:
## Null Hypothesis (H0): There is no significant difference in the Health Risk Score between weekends and weekdays.
## Alternative Hypothesis (H1): There is a significant difference in the Health Risk Score between weekends and weekdays.
## 
##  Welch Two Sample t-test
## 
## data:  weekend_scores and weekday_scores
## t = 29.51, df = 5925, p-value <2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.175105 0.200027
## sample estimates:
## mean of x mean of y 
##   9.36019   9.17262
## 
## Interpretation:
## Reject the null hypothesis: There is a statistically significant difference in the Health Risk Score between weekends and weekdays.

## [1] "The weak negative correlation of -0.24 suggests that higher temperatures are slightly associated with lower health risk scores, though the relationship is not strong"

San Jose has the highest health risk score and wind gusts, while Philadelphia shows the lowest health risk score and Los Angeles the lowest wind gusts.

Visualize the strength and direction of relationships (correlations) between pairs of variables in a dataset.

Wind gusts have a positive impact on health risk, whereas temperature has a weaker, slightly negative relationship with health risk scores.Wind-related factors (gusts and speed) and humidity increase health risk scores, while factors like visibility, sunrise/sunset times, and moon phases reduce them.

## [1] "Does wind gust have a significant impact on the Health Risk Score?"
## [1] "Correlation between windgust and Health risk : 0.719211489784877"
## [1] "Correlation between severity score and Health risk : 0.79812942759995"
## [1] "Correlation between wind speed and Health risk : 0.586553375240766"
## [1] "Correlation between humidity and Health risk : 0.505210350812946"
## [1] "Correlation between humidity and windgust : 0.505210350812946"

## [1] "Correlation between visibility and Health risk : 0.861818410192233"
## [1] "Correlation between sunsetEpoch and Health risk : 0.79812942759995"
## [1] "Correlation between sunriseEpoch and Health risk : 0.586553375240766"
## [1] "Correlation between moonphase and Health risk : 0.586553375240766"

## # A tibble: 5 × 6
##   City    Avg_Dew_Point Avg_Humidity Avg_Pressure Total_Precipitation pressure_a
##   <fct>           <dbl>        <dbl>        <dbl>               <dbl>      <dbl>
## 1 Chicago          52.4         52.2        1018.               1.45    3391525.
## 2 Los An…          45.8         50.6        1012.              10.2     3678768.
## 3 New Yo…          57.0         60.6        1021.               0.232   3590429.
## 4 Philad…          59.0         63.2        1022.               2.99    2374491.
## 5 San Jo…          51.9         59.3        1010.              -1.52    6143571.

Both health risk and wind gust scores fluctuate similarly over time, peaking around September 8th and declining sharply by mid- September. Health risk score is being influenced by windgust.

## [1] "How does the windgust vary over dates?"
## # A tibble: 6 × 2
##   datetime   Average_HRS
##   <date>           <dbl>
## 1 2024-09-07        15.9
## 2 2024-09-08        17.5
## 3 2024-09-09        15.6
## 4 2024-09-10        15.2
## 5 2024-09-11        11.2
## 6 2024-09-12        13.4

## [1] "How does the Health Risk Score vary over dates?"
## # A tibble: 6 × 2
##   datetime   Average_HRS
##   <date>           <dbl>
## 1 2024-09-07        9.83
## 2 2024-09-08        9.80
## 3 2024-09-09        9.22
## 4 2024-09-10        9.42
## 5 2024-09-11        9.01
## 6 2024-09-12        9.30

## # A tibble: 6 × 3
## # Groups:   City [2]
##   City        datetime   Average_HRS
##   <fct>       <date>           <dbl>
## 1 Chicago     2024-09-08        9.75
## 2 Chicago     2024-09-09        9.14
## 3 Chicago     2024-09-10        9.06
## 4 Chicago     2024-09-11        9.15
## 5 Chicago     2024-09-12        9.26
## 6 Los Angeles 2024-09-15        8.72

5. Conclusion and EDA Insights

Health Risk Variation: Health risk scores are higher on weekends compared to weekdays, indicating greater risks during weekends.

City Comparison: San Jose has the highest health risk and wind gusts, while Philadelphia shows the lowest health risk and wind gusts.

Meteorological Impact: Wind gusts significantly increase health risk scores, while temperature has a weaker influence.

Humidity and Health: Humidity was found to have a moderate positive correlation with health risk scores.

Key Factors: Wind-related factors increase health risks, while visibility and sunrise/sunset times lowers them indirectly.

6. Further Research and Limitations

Standardize and screen health risk data sources and expand the research cities and scope

Conducting time-series analyses to explore the relationship between seasonal meteorological factors and health risks.

Developing strategies to mitigate the adverse health impacts of extreme weather conditions, especially in areas with high wind or humidity levels.

7. References